The Internal Data Format for ctmc_fit

The function ctmc_fit expect the data to be structured as follows


In [1]:
data = [([0, 1, 2, 1], [2.2, 3.35, 9.4, 1.3]), 
        ([1, 0, 1], [4.0, 1.25, 1.7])]

Each example or event chain is one element in a array data.

  • The first entry of entry of an example row is a list of states,
  • the second entry a list time periods a state lasted.

How does it work in ctmc_fit?

Initialize variables


In [7]:
import numpy as np
numstates = 3
statetime = np.zeros(numstates, dtype=float)
transcount = np.zeros(shape=(numstates, numstates), dtype=int)

Loop over all examples, and cumulate time periods and count transitions across all examples.


In [12]:
for _, example in enumerate(data):
    states = example[0]
    times = example[1]
    
    for i,s in enumerate(states):
        statetime[s] += times[i]
        if i: transcount[states[i-1], s] += 1

The intermediate results are


In [13]:
statetime


Out[13]:
array([10.35, 31.05, 28.2 ])

In [14]:
transcount


Out[14]:
array([[0, 6, 0],
       [3, 0, 3],
       [0, 3, 0]])

In [6]:
#from scipy.sparse import lil_matrix
#transcount = lil_matrix((numstates, numstates), dtype=int)
#transcount.toarray()